6 research outputs found

    Continuous Top-k Queries over Real-Time Web Streams

    Get PDF
    The Web has become a large-scale real-time information system forcing us to revise both how to effectively assess relevance of information for a user and how to efficiently implement information retrieval and dissemination functionality. To increase information relevance, Real-time Web applications such as Twitter and Facebook, extend content and social-graph relevance scores with " real-time " user generated events (e.g. re-tweets, replies, likes). To accommodate high arrival rates of information items and user events we explore a pub-lish/subscribe paradigm in which we index queries and update on the fly their results each time a new item and relevant events arrive. In this setting, we need to process continuous top-k text queries combining both static and dynamic scores. To the best of our knowledge, this is the first work addressing how non-predictable, dynamic scores can be handled in a continuous top-k query setting

    Processing continuous text queries featuring non-homogeneous scoring functions

    No full text
    International audienceIn this work we are interested in the scalable processing of content filtering queries over text item streams. In particular, we are aiming to generalize state of the art solutions with non-homogeneous scoring functions combining query-independent item importance with query-dependent content relevance. While such complex ranking functions are widely used in web search engines this is to our knowledge the first scientific work studying their usage in a continuous query scenario. Our main contribution consists in the definition and the evaluation of new efficient in-memory data structures for indexing continuous top-k queries based on an original two-dimensional representation of text queries. We are exploring locally-optimal score bounds and heuristics that efficiently prune the search space of candidate top-k query results which have to be updated at the arrival of new stream items. Finally, we experimentally evaluate memory/matching time trade-offs of these index structures. In particular we experimentally illustrate their linear scaling behavior with respect to the number of indexed queries

    Everything you would like to know about RSS feeds and you are afraid to ask

    No full text
    National audienceWe are witnessing a widespread of web syndication technolo- gies such as RSS or Atom for a timely delivery of frequently updated Web content. Almost every personal weblog, news portal, or discussion forum employs nowadays RSS/Atom feeds for enhancing traditional pull-oriented searching and browsing of web pages with push-oriented protocols of web content. Social media applications such as Twitter or Facebook also employ RSS for notifying users about the newly available posts of their preferred friends (or followees). Unfortunately, previous works on RSS/Atom statistical characteristics do not provide a precise and up- dated characterization of feeds' behavior and content, characterization which can be used to successfully benchmark effectiveness and efficiency of various RSS/Atom processing/analysis techniques. In this paper, we present the first thorough analysis of three complementary features of real-scale RSS/Atom feeds, namely, publication activity, items structure and length, as well as, vocabulary of the textual content which we believe are crucial for Web 2.0 applications

    RSS feeds behavior analysis, structure and vocabulary

    No full text
    International audienceWeb syndication technologies such as RSS or Atom are present everywhere on the Web for supporting a timely delivery of frequently updated Web content. Almost every personals weblogs, news portals, or discussion forums employ nowadays RSS/Atom feeds for enhancing the traditional pull-oriented searching and browsing of web pages with the push-oriented protocols of web content. Social media applications such as Twitter or Facebook also propose RSS for notifying users about the newly available items of their preferred friends (or followees). Unfortunately, previous works on RSS/Atom statistical characteristics do not provide a precise and updated characterization of feeds' behavior and content, characterization that can be used to successfully benchmark the effectiveness and efficiency of various web syndication processing/analysis techniques. In this paper, we present a thorough analysis of three complementary features of real-scale RSS/Atom feeds, namely, publication activity, items characteristics, as well as, their textual vocabulary that we believe are crucial for emerging Web 2.0 applications
    corecore